home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 25
/
Cream of the Crop 25.iso
/
database
/
convr704.zip
/
CONVERT.DOC
< prev
next >
Wrap
Text File
|
1997-04-28
|
30KB
|
642 lines
CONVERT.DOC 1 Apr 28, 1997
The CONVERT.EXE program converts data between several basic data formats.
Features:
* You can read data in from any of the following data formats:
ASCII-delimited
fixed field
dBase-compatible
* You can write data out in any of the following data formats:
ASCII-delimited
fixed field
dBase-compatible
WKS (Lotus 1-2-3 release 1 compatible)
* For ASCII-delimited files, you can specify the delimiters used between
fields as well as around numeric or character data.
* For dBase input files, you can retain deleted records if you want.
* You can resize variables or drop them entirely if desired.
* You can add fields as desired and assign initial values to them.
* You can specify up to 10 include filters; all records processed must meet
at least one of these filter conditions.
* You can specify up to 10 exclude filters; any records which meet these
filter conditions are dropped.
* For WKS output files, you can process input files bigger than Lotus 1-2-3
itself can handle (1-2-3 limits input records to being 240 characters or
less).
* Cell filters can be created, ignoring, say, any record where the value in
cell 4 is less than 10.
* The program can only handle files with 255 fields or fewer. (Those
obsessed with handling more fields can ask for a version that does so.)
* Handles DOS text files (lines end with CR/LF), Mac text files (lines end
with CR), or Unix text files (lines end with LF).
* Pressing escape stops the program early.
CONVERT.DOC 2 Apr 28, 1997
Data format types:
An ASCII-delimited file is one which typically has double quotes around each
character field (the quotes are optional in CONVERT) and typically commas
between fields. Leading and trailing spaces are removed from character as well
as all other values.
A fixed-field file places each field in the same column positions on each
record. The lengths of the fields are the same from record to record.
As an example, these might be an ASCII-delimited records:
"Economic Bulletin Board","202 482-3870",35
"EBB High-Speed","202 482-2584",100
Fixed-field file records might look like this for the same data:
Economic Bulletin Board 202 482-3870 35
EBB High-Speed 202 482-2584 100
(On files generated on a PC, most fixed-field file records end with a CR/LF
combination. Although these two characters actually add two characters to each
line, most people and programming languages ignore them. Files generated
elsewhere may not have these line terminators. Use the /BINARY option if this
is the case.)
WKS files are supported directly by Lotus 1-2-3 (all versions) as well as most
other spreadsheet programs. DBF files are supported by dBaseIII, dBaseIII+,
dBaseIV, and most other data base management programs (Paradox etc).
CONVERT.DOC 3 Apr 28, 1997
Field-definition file:
Unless you are reading a dBase file, this program requires a field-definition
file to figure out the characteristics for each field and also to set certain
file characteristics. If you're processing an ASCII-delimited input file, the
routine can try to create a field-definition file for you if desired.
The field-definition file can be created with any text editor. The
field-definition file consists of several records with the following fields
separated by spaces. Except for the record type indicator (which must begin in
column 1), all other fields can be placed in any columns:
(1) record type (see below)
(2) length of field on input
(3) number of decimal places for numeric data (if you don't know, put a "?"
here; for non-numeric data, a "0" is fine) on output
(4) length of field on output
Any characters after the field length are treated as comment fields. You would
typically use this to enter the field name or column position or any other
information of use to you.
If you'd like, you can leave out both decimal place count *and* the length of
the output field. (You cannot leave off one and include the other however.) If
they are left out, the number of decimal places is presumed to be "0" and the
length of the output field is presumed to be the same as the length of the
input field.
The data record types accepted by this routine are as follows:
type C = character data (leading spaces are trimmed)
V = verbatim character data (no leading spaces are trimmed)
N = numeric
L = logical (T or F)
D = date (in yyyymmdd format)
M = memo fields (only for dBase input files; ignored on output)
The data fields should be in the order the fields are found in the source file.
Note that for fixed field files, you have to account for every byte in the
file. If you have something like this:
12345678_1_2345678_2_2345678_3 (column positions)
APPLE X Y 12 BANANAS
Even though you may think you only have five fields, the following .DEF file
will NOT work:
; Bad .DEF file: Note does not account for blank spaces
; NOTE: Input and output lengths are the same (mistakenly) so left
; out number of decimal places and output length for each record.
C 8 Fruit1
C 1 Class1
C 1 Class2
N 5 Value
C 11 Fruit2
CONVERT.DOC 4 Apr 28, 1997
You may want the Fruit1 field to be in columns 1 through 8 and Class1 to be in
column 10 but the routine will not know to skip column 9 so it will start
reading Class1 beginning in column 9, Class2 beginning in column 10, etc. To
drop the blank positions, you have to add dummy fields on input and ask for
them to be dropped on output:
; Good .DEF file: Spaces between fields are accounted for
C 8 Fruit1
C 1 0 0 Filler
C 1 Class1
C 1 0 0 Filler
C 1 Class2
C 1 0 0 Filler
N 5 Value
C 1 0 0 Filler
C 11 Fruit2
You can also use the input field length and output field lengths to either drop
fields using other formats (by specifying a zero length for the output field
length) or for creating fields on output (by specifying a zero length for the
input field length). You can also use this to expand on contract a field. For
example, if Fruit1 is 8 characters long but you only want it to occupy 4
characters on output (thus the field would be truncated), specify 8 for the
input field length and 4 for the output field length:
C 8 0 4 Fruit
If the output field length is wider than the input field length, the data
values will be shifted right or left depending on the data type. In general,
numeric fields are shifted right (so extra spaces show up in front of the
number) and all other field types (character, logical, or date) are shifted
left.
You can have the routine create the field-definition file it's using for dBase
and ASCII-delimited files. This is controlled by the /OUTDEF=deffile and
/-OUTDEF parameters.
ASCII-delimited file? Letting the program guess the fields:
Let's say you have some sort of ASCII-delimited file. Do you have to create a
field-definition file? Nope. The program will try to guess the format for you
if you want.
For example, let's say you have a file called MYFILE.PRN and the first records
look like this:
ACIN,970114, 1.49 , 1.5 , 1.49 , 1.495 , 2312512 , 0
AGRO,970114, 1.3 , 1.4 , 1.3 , 1.4 , 68319 , 0
ALPA,970114, .875 , .89 , .875 , .885 , 1805075 , 0
CONVERT.DOC 5 Apr 28, 1997
Well, looking at it, you could probably guess that it has 8 fields per record,
the first of which is character and the others are probably numeric. So you
could create a field-definition file and go with that. But let's see how the
program does on its own.
Looking at the fields, there are apparently no special characters (like
quotation marks) around the character fields. There are also no special
characters around the numeric fields. There are, however, commas between the
fields.
A standard ASCII-delimited file would have quotation marks around the
alphabetic fields. Your's does not have this. As a result, you will have to
specify a /DELIMS= parameter. The format for this is:
/DELIMS=aroundstrings,aroundnums,betweenfields
In your case, there is nothing around the alphabetic fields so "aroundstrings"
is null. There is nothing around numeric fields so "aroundnums" is null. There
are commas between fields so "betweenfields" is a comma. Thus, your /DELIMS=
parameter is:
/DELIMS=,,,
So convert this file to a fixed-field file without having already created a
field-definition file, you would issue the following command:
CONVERT MYFILE.PRN /FROM ASCII /TO FIXED /DELIMS=,,, /-INDEF /OUTDEF
When the program runs, it will, in this case, create a fixed-field file called
MYFILE.FIX as well a field-definition file called MYFILE.DEF. That file would
might like this:
; Field definition file: C:\HOME\PRECIOS.DEF
; Created by CONVERT for FROM ASCII/FROM FIXED
; Created at 13:04:30 on 01-27-1997
; Positions in far left are positions on output
N 10 4 10 COL_001 In( 1: 10) Out( 1: 10)
N 6 0 6 COL_002 In( 11: 16) Out( 11: 16)
N 7 4 7 COL_003 In( 17: 23) Out( 17: 23)
N 8 4 8 COL_004 In( 24: 31) Out( 24: 31)
N 7 4 7 COL_005 In( 32: 38) Out( 32: 38)
N 8 4 8 COL_006 In( 39: 46) Out( 39: 46)
N 11 9 11 COL_007 In( 47: 57) Out( 47: 57)
N 1 0 1 COL_008 In( 58: 58) Out( 58: 58)
You can modified this field-definition file if you'd like. Maybe you want to
change the number of decimal places shown or expand some columns on output or
perhaps eliminate others. Just edit the field-definition file with a text
editor and resave it. Then rerun the CONVERT command, this time specifying
that there *is* a field-definition file:
CONVERT MYFILE.PRN /FROM ASCII /TO FIXED /DELIMS=,,,
CONVERT.DOC 6 Apr 28, 1997
Filters:
CONVERT supports two types of filters; record filters and cell (field) filters.
Record filters:
In general, record filters apply to the record as a whole. They are
specified as unique types of records in the field-definition file.
You may specify up to 10 include record filters and up to 10 exclude record
filters in the field-definition file.
If an include record filter is specified, the input record must contain at
least one of the specified character strings in order to be processed. If an
exclude record filter is specified, any input record which contains any of
the specified character strings will be ignored. Record filters are case
sensitive (capitalization matters) and processed as "or" items (if any filter
is met, the condition is met; record filters are not combined to determine
fulfillment). Record filters are specified in the field-definition file in
the following ways:
/+=string include filter, the string "string" can appear anywhere
/S+=string include filter, the string "string" is at the beginning of
the record
/+S=string include filter, the string "string" is at the end of the
record
/-=string exclude filter, the string "string" can appear anywhere
/S-=string exclude filter, the string "string" is at the beginning of
the record
/-S=string exclude filter, the string "string" is at the end of the
record
For example, if you want to specify in your control file that you only want
records that contain either "Japan" or "France" *and* you want to exclude any
records that begin with an underscore character, you would need to include
the following three filter statements:
/+=Japan
/+=France
/S-=_
The character string can include hexadecimal codes (in the &Hxx format) or
decimal codes (in the \ddd format) if necessary. See BRUCEHEX.DOC file.
CONVERT.DOC 7 Apr 28, 1997
Cell filters:
Cell filters are applied to individual cells (or fields) in the data. If any
cell fails the test for that cell, the entire record is skipped for
additional processing.
Each variable in the field-definition file can include a cell filter.
Filters are specified on the field type records as the last parameters on the
record and are immediately preceded with "||" indicators:
N 8 Weight || > 100
N 4 Height || <= 6
C 10 Name || = Banana
Cell filters can be specified as any one of six relations:
"=" is equal to
"<" is less than
">" is greater than
"<=" is less than or equal to
">=" is greater than or equal to
"<>" is not equal to
You cannot specify ranges. The item to the right of the relation is treated
as a string if the field is non-numeric, otherwise, it's treated as a number.
Do not include quotes around the strings unless you want that as part of the
condition. The value can include hexadecimal codes (in the &Hxx format) or
decimal codes (in the \ddd format) if necessary (see BRUCEHEX.DOC file). It
can also contain spaces. The cell filter is case sensitive (capitalization
matters).
If the field is being created, you can specify an assignment cell filter
(using "=") which will set the value of that cell as something. For example:
C 10 First Name
C 0 0 1 Middle Initial || = ?
C 10 0 20 Last Name
Field-definition file for SimTel archives:
People who use the SimTel archives (ftp.coast.net or
http://www.coast.net/SimTel) may find the enclosed SIMIBM.DEF file useful. It
provides the field definitions for the standard SIMIBM.IDX file (from
subdirectory SimTel/filedocs, download simindex.zip). The file includes some
hints for dropping fields that don't seem to be useful as well as restricting
the listing to just those files that have been added since a given date.
Specifying parameters:
Parameters for this program can be set in the following ways. The last setting
encountered always wins:
- Read from an *.INI file (see BRUCEINI.DOC file),
- Through the use of an environmental variable (SET CONVERT=whatever), or
- From the command line (see "Syntax" below)
CONVERT.DOC 8 Apr 28, 1997
Syntax:
CONVERT infile [ outfile [ deffile ] ]
[ /INDEF=deffile | /-INDEF ] [ /OUTDEF=deffile | /OUTDEF | /-OUTDEF ]
[ /OVERWRITE | /-OVERWRITE | /APPEND | /OVERASK ]
[ /FROM FIXED | /FROM ASCII | /FROM DBF ] [ /DELETED ]
[ /TO FIXED | /TO ASCII | /TO WKS | /TO DBF ] [ /HEADER ]
[ /DELIMS=aroundstrings,aroundnums,betweenfields ] [ /BEEP ]
[ /INMISS=val ] [ /INMISSC=val ] [ /OUTMISS=val ] [ /OUTMISSC=val ]
[ /SKIP | /MISSING | /ABORT ] [ /FIRSTOBS=n ] [ /LASTOBS=n ] [ /-VER ]
[ /-NULLS ] [ /BINARY ] [ /SCAN=n ] [ /GAP=n ] [ /-CFILTERS ]
[ /Q | /Qn ] [ /MONO ] [ /Iinitfile | /-I ] [ /-ENV ] [ /? ] [ /?&H ]
"infile" is the file specification for the ASCII-delimited or fixed-field file
you want converted. You can specify a drive and path specification if
necessary. This parameter is required.
"outfile" is the file specification of the file you want to create. You can
specify a drive and path specification if necessary. If no outfile is
provided, the routine will presume you want the output file called the same
thing as the infile but you want the extension to be ".FIX" (if the output file
is a fixed-field file), ".PRN" (if the output file is ASCII-delimited), or
".WKS" (if the output file is to be in a WKS format).
"deffile" is the file specification for the input field-definition file. You
can specify a drive and path specification if necessary. If no deffile is
specifically provided, the routine will presume it is called the same thing as
the infile but it has the extension of ".DEF". Note that the deffile can only
be provided this way if you also specify the outfile name; the use of
/INDEF=deffile is recommended instead.
"/INDEF=deffile" provides the name of the field-definition file to be read by
the program. If no deffile is specifically provided and you're using FROM
ASCII, the routine will presume the field-definition file exists and is named
the same thing as the infile but it has the extension of ".DEF". A deffile has
to be specifically provided for FROM FIXED files if one is desired.
"/-INDEF" says there is no field-definition file. This is the default if
you're using FROM DBF or FROM FIXED.
"/OUTDEF=deffile" provides the name of the output file that you want the
program to write the field-definition file to. This is useful in cases where
you did *not* use a field-definition file on input since it allows you to see
and possibly modify the field-definition file for next time.
"/OUTDEF" says to create the field-definition file and automatically name it
for you. The file name is the same as the infile with an extension of ".DEF".
"/-OUTDEF" says to skip the creation of the field-definition file. This is the
default for ASCII-delimited and fixed-field files.
CONVERT.DOC 9 Apr 28, 1997
"/OVERWRITE" says to overwrite the output file if it exists already.
"/-OVERWRITE" says to abort if the output file exists already.
"/APPEND" says to append (add) to the output file if it exists already. This
option is only available if you're creating either a fixed-field or ASCII-
delimited output file.
"/OVERASK" says to ask if the output file exists already. This is initially
the default.
"/FROM FIXED", "/FROM ASCII", and "/FROM DBF" specifies the format for the
input file. The routine usually reads the input file and guesses its format
for you. If the routine guesses incorrectly, use the /-VER parameter (below)
to overrule it.
"/DELETED" applies to dBase input files only. It says you want to retain
records tagged as "deleted". Otherwise, they're dropped in the output file.
"/-DELETED" applies to dBase input files only. It says to drop records tagged
as "deleted". This is initially the default.
"/TO FIXED", "/TO ASCII", "/TO WKS", and "/TO DBF" tells the routine what sort
of output file you'd like to create. The input and output file formats can be
the same if desired but you have to explicity specify an output file name in
that case. Initially defaults to "/TO FIX".
"/HEADER" is used in conjunction with WKS output files. If /HEADER is in
effect, the first row of the spreadsheet will contain the variable name for the
cell as provided in your control file. If none are provided, the field names
will be FIELD_01 onward. Note that the header line (if any) will show up in
the output counts. Initially defaults to "/-HEADER".
"/-HEADER" writes only data to the output WKS file, instead of reserving the
first record for the name of the variable. This is initially the default.
"/DELIMS=aroundstrings,aroundnums,betweenfields" allows you to specify the
delimiters (in sequence) around string fields, around numeric fields (any
fields that isn't a character field), and between fields. Defaults to:
/DELIMS=",,,
(Use quotes around character strings, nothing around numeric data, and the
third comma indicates that there is a comma between fields.) The replacement
string can include hexadecimal codes (in the &Hxx format) or decimal codes (in
the \ddd format) if necessary (see BRUCEHEX.DOC file) so either of the
following would put a tab between fields:
/DELIMS=",,&H09
/DELIMS=",,\009
"/BEEP" beeps when the program is finished.
"/-BEEP" reverses /BEEP. Initially the default.
CONVERT.DOC 10 Apr 28, 1997
"/INMISS=val" specifies that any numeric value that has the character string
representation of "val" will be considered missing. Note that this is an exact
character string comparison so /INMISS=1 will not compare to a value of "1.00".
Defaults to /INMISS=NULL (which translates as spaces).
"/INMISSC=val" specifies that any character string value that has the value of
"val" will be considered missing. Defaults to /INMISSC=NULL (which translates
as spaces).
"/OUTMISS=val" specifies that any missing numeric input value will be
translated on output as "val". For example, "/OUTMISS=N.A." would fill in
"N.A." for each missing value. Defaults to /OUTMISS=NULL (which translates as
spaces).
"/OUTMISSC=val" specifies that any missing character input value will be
translated on output as "val". Defaults to /OUTMISSC=NULL (which translates as
spaces).
"/SKIP" says to skip records with bad data values; otherwise the routine aborts
when it runs into any. /SKIP, /MISSING, and /ABORT are mutually exclusive.
"/MISSING" says to presume any missing fields in an ASCII-delimited record
should be filled in with blanks (for character fields) and 0 for numeric
fields. Incomplete records are written out (unlike in /SKIP). /SKIP,
/MISSING, and /ABORT are mutually exclusive. Note that the program will only
print out the first "bad" record. There may be others that show up after this
one.
"/ABORT" says to abort when you run into bad records. Initially the default.
/SKIP, /MISSING, and /ABORT are mutually exclusive.
"/FIRSTOBS=n" says to start reading the data beginning with record number n.
Initially defaults to "/FIRSTOBS=1".
"/LASTOBS=n" says to stop reading the data after record number n. Initially
defaults to "/LASTOBS=2000000000" (2 billion).
"/-VER" is used when the program verifies your input file and mistakenly
determines that it is a file type other than what it is. This sometimes
happens with fixed-field input files even when /FROM FIXED is specified.
"/NULLS" allows fields that begin with decimal 0 to be left in the output file
as valid values. Otherwise, they're treated as being missing. /NULLS is
initially the default.
"/-NULLS" translates any field which begins with the decimal 0 value as
missing. For character fields, it's translated to the value of INMISSC,
numeric fields are switched to the INMISS value. /NULLS is initially the
default.
CONVERT.DOC 11 Apr 28, 1997
"/BINARY" says that you have a fixed-field input file and that the physical
records in this file do not end with the normal CR/LF combination. This is
fairly typical of files created on a mainframe or copied from a tape. If
/BINARY is used, every byte must be precisely accounted for. /BINARY
automatically invokes the /FROM FIXED option.
"/-BINARY" says that every record ends with a CR/LF combination. This is
fairly standard on files created on a PC, even fixed-field files. When
/-BINARY is in effect, trailing fields are ignored. "/-BINARY" is initially
the default.
"/SCAN=n" says to read the first n-records when determining the input file
characteristics. This is only relevant for /FROM ASCII files which use the
/-INDEF option. The field types are based on the first record read but the
maximum field widths are determined by reading the first n-records. Initially
defaults to "/SCAN=100".
"/GAP=n" specifies to add n-characters to each output field width. This is
only relevant for /TO FIXED fields which use the /-INDEF option. In this case,
n-characters are added to each output field length for non-numeric fields and
extra blank fields are added before every numeric field. This is an easy way
of splitting up columns on display. If you want non-uniform gaps, you can
always process the file twice; specify /-INDEF with /OUTDEF the first time,
modify the field-definition file by hand, and then reprocess the file with
/INDEF=filename. Initially defaults to "/GAP=0".
"/CFILTERS" says that if cell filters are specified in the field-definition
file (for example, "N 3 Age || > 5"), these cell filters are to be respected.
This is initially the default. (If cell filters do not appear in the
field-definition file at all, the switch is ignored.)
"/-CFILTERS" says that if cell filters are specified in the field-definition
file, they are to be ignored. This is useful sometimes if you want the whole
file instead of just getting a subset of it. Note that this switch will also
turn off initializing new fields (as in "N 0 0 3 Age || = 12").
"/Q" turns off the record-by-record status report.
"/Qn" shows a status message every n-number of records. The default is "/Q25".
"/MONO" (or "/-COLOR") does not try to override screen colors. Initially
defaults to "/COLOR".
"/COLOR" (or "/-MONO") allows screen colors to be overridden. This is
initially the default.
"/Iinitfile" says to read an initialization file with the file name "initfile".
The file specification *must* contain a period. Initfiles are described in the
BRUCEINI.DOC file. Initially defaults to "/ICONVERT.INI".
"/-I" (or "/INULL") says to skip loading the initialization file.
CONVERT.DOC 12 Apr 28, 1997
"/ENV" says to look for %var% occurrences in the command line and try to
resolve any apparent environmental variable references. See BRUCEINI.DOC for
more information. This is initially the default.
"/-ENV" says to skip resolving apparent %var% occurrences in the command line.
Initially defaults to "/ENV".
"/?" or "/HELP" or "HELP" shows you the syntax for the command.
"/?&H" gives you a hexadecimal and decimal conversion table.
Return codes:
CONVERT returns the following ERRORLEVEL codes:
0 = no problems, file converted
250 = operation aborted by pressing Escape
251 = other problems
252 = problems with input data and /ABORT specified
253 = problems with INDEF or OUTDEF file
254 = could not find a decent temporary output subdirectory
255 = syntax problems, file(s) not found, output file already exists,
/? requested
CONVERT.DOC 13 Apr 28, 1997
Restrictions and Caveats:
* The program skips all Memo fields in dBase files on input.
* Most spreadsheet programs restrict a given field length to being 240
characters or less.
* Date fields which are in the form "yy-mm-dd" (instead of "yyyymmdd") should
be declared as character fields instead of date fields.
* Date fields converted for WKS files are changed into strings of the
"yy-mm-dd" format.
* Some users have reported problems using this program when their default
drive is a network drive instead of a local hard drive. If you get a
"path/file access error", make C: your default drive and try again.
Author:
Bruce Guthrie
Room H-4885
U.S. Dept of Commerce/ESA/STAT-USA
Washington, DC 20230
voice: (202) 482-3234
e-mail: bguthrie@doc.gov
You may freely copy and re-distribute this program; however, the U.S.
Department of Commerce neither guarantees nor assures compatibility of the
program with all computer software or hardware.
Additional information about this and other Bruce Guthrie programs can be found
in the file BRUCE.DOC which should be included in the original ZIP file. The
recent change history for this and the other programs is provided in the
HISTORY.ymm file which should be in the same ZIP file where "y" is replaced by
the last digit of the year and "mm" is the two digit month of the release;
HISTORY.611 came out in November 1996. This same naming convention is used in
naming the ZIP file (CONVRymm.ZIP) that this program was included in.
Please provide an Internet e-mail address on all correspondence.